NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Neural Networks with Sparse Activations

Awasthi, Pranjal; Dikkala, Nishant; Kamath, Pritish; Meka, Raghu (June 2024, Journal of machine learning research)
Learning Neural Networks with Sparse Activations

Awasthi, Pranjal; Dikkala, Nishanth; Kamath, Pritish; Meka, Raghu (June 2024, Proceedings of the 37th Conference on Learning Theory (COLT 2024))

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.
more » « less
Full Text Available
Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes

Awasthi, Pranjal Awasthi; Haghtalab, Nika; Zhao, ERic (September 2023, Proceedings of Machine Learning Research)
Neu, Gergely; Rosasco, Lorenzo (Ed.)
Full Text Available
Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes

Awasthi, Pranjal Awasthi; Haghtalab, Nika; Zhao, ERic (September 2023, The Thirty Sixth Annual Conference on Learning Theory)
Neu, Gergely; Rosasco, Lorenzo (Ed.)
Understanding Simultaneous Train and Test Robustness.

Awasthi, Pranjal; Balakrishnan, Sivaraman; Vijayaraghavan, Aravindan (January 2022, ALT)

Full Text Available
Active Sampling for Min-Max Fairness.

Abernethy, Jacob D; Awasthi, Pranjal; Kleindessner, Matthäus; Morgenstern, Jamie; Russell, Chris; Zhang, Jie (January 2022, International Conference on Machine Learning)

Full Text Available
Individual Preference Stability for Clustering

Ahmadi, Saba; Awasthi, Pranjal; Khuller, Samir; Kleindessner, Matth; Morgenstern, Jamie; Sukprasert, Pattara; Vakilian, Ali (January 2022, International Conference on Machine Learning)

Full Text Available
Estimating Principal Components under Adversarial Perturbations

Awasthi, Pranjal; Chen, Xue; Vijayaraghavan, Aravindan (January 2020, Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:323-362)
null (Ed.)
Full Text Available
Efficient Active Learning of Sparse Halfspaces with Arbitrary Bounded Noise

Zhang, Chicheng; Shen, Jie; Awasthi, Pranjal (January 2020, Advances in neural information processing systems)
null (Ed.)
Full Text Available
Estimating Principal Components under Adversarial Perturbations

Awasthi, Pranjal; Chen, Xue; Vijayaraghavan, Aravindan (January 2020, Proceedings of Thirty Third Conference on Learning Theory (PMLR))

Full Text Available

« Prev Next »

Search for: All records